405 research outputs found
Benchmarking High Performance Architectures With Natural Language Processing Algorithms
Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared
Mobile Social Networks For Live Meetings
In this article, we present an idea of combining social networking websites andmodern mobile devices abilities to transfer social networking activity to a higherlevel. Nowadays, these devices and websites are used to offer ability of remotecommunication (phone calls, message exchange etc.), which potentially can beused to notify people about meetings in the real world. Since the current socialnetwork models do not provide enough information for such notification (socialnetworking websites are examples of social networks) a new social network modelthat will be suitable for the above mentioned application is proposed and a newsocial platform that base on mobile devices is introduced. This platform cannotify users when their friends are nearby. The paper presents the model andthe simulation that verifies the approach
Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering
In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine
Comparison of Information Representation Formalisms for Scalable File Agnostic Information Infrastructures
In the early days of computing, files where just a natural way of storing information -- which reflected the way one would file their punch cards in a cabinet drawer. Unfortunately, the requirement to fragment information into such chunks, is a huge bottleneck for the evolution of global information space that the Internet has become. The concept of file causes several problems including unnatural clustering of information, unnecessary replication of data and very expensive information discovery in distributed computing environments. The overall goal of this work is to design an architecture enabling new era in computing and networking -- a computing infrastructure without the concept of file. Files are seen by many specialists as one of the main bottlenecks of modern IT systems evolution. This is mostly due to a very unnatural fragmentation of information into chunks which are easier to manage by operating systems but much more difficult for information processing tools and eventually by humans themselves
A Case Study of Algorithms for Morphosyntactic Tagging of Polish Language
The paper presents an evaluation of several part-of-speech taggers, representing main tagging algorithms, applied to corpus of frequency dictionary of the contemporary Polish language. We report our results considering two tagging schemes: IPI PAN positional tagset and its simplified version. Tagging accuracy is calculated for different training sets and takes into account many subcategories (accuracy on known and unknown tokens, word segments, sentences etc.) The comparison of results with other inflecting and analytic languages is done. Performance aspects (time demands) of used tagging tools are also discussed
Resource Storage Management Model for Ensuring Quality of Service in the Cloud Archive Systems
Nowadays, service providers offer a lot of IT services in the public or private cloud. The client can buy various kinds of services like SaaS, PaaS, etc. Recently there was introduced Backup as a Service (BaaS) as a variety of SaaS. At the moment there are available several different BaaSes for archiving the data in the cloud, but they provide only a basic level of service quality. In the paper we propose a model which ensures QoS for BaaS and some methods for management of storage resources aimed at achieving the required SLA. This model introduces a set of parameters responsible for SLA level which can be offered on the basic or higher level of quality. The storage systems (typically HSM), which are distributed between several Data Centres, are built based on disk arrays, VTLs, and tape libraries. The RSMM model does not assume bandwidth reservation or control, but is rather focused on the management of storage resources
Increasing Quality of the Corpus of Frequency Dictionary of Contemporary Polish for Morphosyntactic Tagging of the Polish Language
The paper is devoted to the issue of correction of the erroneous and ambiguous corpus of Frequency Dictionary of Contemporary Polish (FDCP) and its application to morphosyntactic tagging of the Polish language. Several stages of corpus transformation are presented and baseline part-of-speech tagging algorithms are evaluated, too
Modelling Agents Cooperation Through Internal Visions of Social Network and Episodic Memory
Human societies appear in many types of simulations. Particularly, a lot of new computer games contain a virtual world that imitates the real world. A few of the most important and the most difficult society elements to be modelled are the social context and individuals cooperation. In this paper we show how the social context and cooperation ability can be provided using agents that are equipped with internal visions of mutual social relations. Internal vision is a representation of social relations from the agent's point of view so, due to being subjective, it may be inconsistent with the reality. We introduce the agent model and the mechanism of rebuilding the agent's internal vision that is similar to that used by humans. An experimental proof of concept implementation is also presented
Application of Weighted Voting Taggers to Languages Described with Large Tagsets
The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets
Database Replication for Disconnected Operations with Quasi Real-Time Synchronization
Database replication is a way to improve system throughput or achieve high availability. In most cases, using an active-active replica architecture is efficient and easy to deploy. Such a system has CP properties (from the CAP theorem: Consistency, Availability and network Partition tolerance). Creating an AP (available and partition tolerant) system requires using multi-primary replication. This approach, because of many difficulties in implementation, is not widely used. However, deployment of CCDB (experiment conditions and calibration database) needs to be an AP system in two locations. This necessity became an inspiration to examine the state-of-the-art in this field and to test the available solutions. The tests performed evaluate the performance of the chosen replication tools: Bucardo and EDB Replication Server. They show that the tested tools can be successfully used for continuous synchronization of two independent database instances
- …